Java入門 HTMLタグの要素を取り出す正規表現式

1. <a href="…">~</a>のように属性を持つタグの要素を取り出す
Javaコード:
String patternStrs="(<//s*a//s+(?:[^//s>]//s*){0,})href//s*=//s*(/"|’|)([^//2//s>]*)//2((?://s*[^//s>]){0,}//s*>)";
PatternCompiler complier = new Perl5Compiler();
PatternMatcher matcher = new Perl5Matcher();
Pattern patternForLink = complier.compile(patternStrs, Perl5Compiler.
CASE_INSENSITIVE_MASK);
PatternMatcherInput input = new PatternMatcherInput(htmlContent);
while (matcher.contains(input, patternForLink)) {
MatchResult match = matcher.getMatch();
}

2.<img src="…">のように属性を持つタグの要素を取り出す
(</s*img/s+(?:[^/s>]/s*){0,})src/s*=/s*(“|’|)([^/2/s>]*)/2((?:/s*[^/s>]){0,}/s*>)

Java

Posted by arkgame